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EXPERIMENTAL DESIGNS FOR MULTIPLE-LEVEL RESPONSES, 
WITH APPLICATION TO A LARGE-SCALE EDUCATIONAL 

INTERVENTION^ 

By Brenda Jenney and Sharon Lohr 

Arizona State University 

Educational research often studies subjects that are in naturally 
clustered groups of classrooms or schools. When designing a random- 
ized experiment to evaluate an intervention directed at teachers, but 
with effects on teachers and their students, the power or anticipated 
variance for the treatment effect needs to be examined at both levels. 
If the treatment is applied to clusters, power is usually reduced. At 
the same time, a cluster design decreases the probability of contam- 
ination, and contamination can also reduce power to detect a treat- 
ment effect. Designs that are optimal at one level may be inefficient 
for estimating the treatment effect at another level. In this paper 
we study the efficiency of three designs and their ability to detect a 
treatment effect: randomize schools to treatment, randomize teachers 
within schools to treatment, and completely randomize teachers to 
treatment. The three designs are compared for both the teacher and 
student level within the mixed model framework, and a simulation 
study is conducted to compare expected treatment variances for the 
three designs with various levels of correlation within and between 
clusters. We present a computer program that study designers can 
use to explore the anticipated variances of treatment effects under 
proposed experimental designs and settings. 

1. Introduction. Randomized experiments are frequently recommended 
for evaluating educational studies [Cook and Payne (2002)]. The 
What Works Clearinghouse (2006, page 5), guidelines state: "Studies that 
Meet Evidence Standards are well-designed and implemented randomized 
controlled trials." Boruch (2002), page 38, comments that although ran- 
domized trials have "been slow to come to the field of education," they 
provide the best way of ascertaining which interventions are truly effective 
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for helping students. Randomized studies can provide evidence of a causal 
relationship between an intervention and results, and they are frequently 
cited by educational reformers [Gueron (2005)]. There are many ways, how- 
ever, of conducting a randomized trial, and it is desired to have a design 
that provides as much information as possible using the available resources. 
In this paper we study the efficiency of randomized designs for a situation 
in which teachers at multiple schools are randomized to treatments, and the 
impact of the educational treatment can be measured at multiple levels: in 
our case, at both the teacher and the student level. 

The research in this paper was motivated by the problem of evaluat- 
ing the effects of Project Pathways [CRESMET, Arizona State University 
(2007)]. Project Pathways is a professional development program for sec- 
ondary STEM (Science, Technology, Engineering and Mathematics) teach- 
ers, with the immediate goal of increasing teachers' conceptual and pedagog- 
ical knowledge of STEM topics. The primary intervention of Project Path- 
ways is a set of four courses taken by secondary STEM teachers in schools 
located in Maricopa County, Arizona. The courses are developed around the 
themes of functions and proportional reasoning. Course 1 concentrates on 
the mathematics, and courses 2-4 integrate the mathematical concepts with 
biology, physics, chemistry, geology and engineering. It is hypothesized that 
the teachers' increased understanding will lead to increased knowledge and 
achievement for the students who take classes from those teachers. Thus, 
while the treatment is administered to teachers, effects of the treatment 
need to be evaluated at both the teacher level and the student level. Stu- 
dents will likely take classes from several teachers over several semesters, so 
the data structure will not be completely hierarchical. In a completely hier- 
archical structure the students would be nested in teachers. An additional 
complication is that Project Pathways has no input on assigning students to 
teachers, so the design needs to be robust to possible self-selection by stu- 
dents. This research aims to provide some guidance for choice of randomized 
experimental design in multi-school studies such as Project Pathways when 
the effect of the intervention can be measured at more than one level. We 
develop theoretical results comparing efficiencies of designs in a general set- 
ting. We also present a computer program, multleveldesign, that designers 
can use to plan a study tailored to their circumstances. 

While there have been many small-scale randomized studies performed 
within schools, and many randomized studies relating to education programs 
that focus on students' physical and mental health, as well as tobacco, drug 
and alcohol use prevention programs, there have been relatively few whole 
school reforms tested with randomized studies [Cook (2003)]. As more em- 
phasis is placed on rigorous evaluations in education, the large-scale trials 
that are common in medical research should become more prevalent when 
studying educational innovations. One program currently under evaluation 
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by randomized study is the reading program Success for All, which currently 
tracks the progress of students in 41 schools [Borman et al. (2005)]. Random- 
ized studies have been found to give more reliable results in other fields, such 
as psychiatry [Johnson (1998)] and criminal justice [Berk et al. (2003)]. Cook 
(2005) provides a synthesis of the most commonly encountered problems in 
cluster randomized designs, as well as the merits of cluster-based experi- 
ments in the social sciences. Reasons to consider cluster randomization are 
also explored by Gail et al. (1996). For a discussion of optimal design when 
budget constraints are present, see Moerbeek, van Breukelen and Berger (2000) 
and Raudenbush and Liu (2000). 

A number of researchers have studied the merits of different random- 
ized designs in the hierarchical setting when the response is measured at 
one level. Raudenbush (1997) gives guidance on how to optimally design 
cluster-randomized studies, based on a criterion of minimizing the stan- 
dard error of the treatment contrast. Bloom, Bos and Lee (1999) study the 
power of cluster-randomized designs and note that, given the same num- 
ber of individuals in a program, these types of designs produce a smaller 
effective sample size. Bloom, Bos and Lee (1999), Raudenbush (1997) and 
Bloom, Richburg-Hayes and Black (2005) advocate the use of covariates that 
are related to the measured response to increase the statistical power in 
cluster-randomized designs. Moerbeek, van Breukelen and Berger (2000) de- 
rive the relative efficiency of randomize-by-school vs. randomize-teacher- 
within-school designs for evaluating the impact of treatment on teachers. 
Moerbeek (2005) studies the effect of contamination — when some teachers 
in the control group adopt the intervention method — on power. The effi- 
ciency of designs for multiple response levels when the data structure is not 
completely hierarchical, however, has not been previously explored. 

To provide guidance on study design for multilevel-response studies of 
this type, we look at the efficiencies for measuring program impact on teach- 
ers and students for three randomization schemes for the intervention: (1) 
randomly assigning schools, with all their teachers and students, to the ex- 
perimental or control groups, (2) randomly assigning half of the teachers 
within each school to the experimental group (a randomized block design 
for teachers), and (3) randomly assigning teachers regardless of school to the 
two groups (a completely randomized design for teachers). In Section 2 we 
construct a unified mixed model framework for responses of teachers and stu- 
dents, and in Section 3 we study the relative efficiencies of the three designs 
for assessing treatment effects at teacher and student levels. In Section 4 
we explore effects of possible treatment contamination, noncompliance or 
attrition on the relative efficiencies. Section 5 describes a computer program 
that may be used to simulate the distribution of the anticipated variance 
of the various designs for measuring the impact of the intervention on both 
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teachers and students. In Section 6 we discuss implications of the results for 
evaluation design choice. 

In this paper we express all results in terms of educational experiments for 
ease of interpretation, but the research results apply to many other settings 
as well. A multicenter clinical trial involving a new physical therapy method 
may randomize therapists to treatments within centers, or may randomize 
entire centers with all their staffs to treatments. In this case, it is desired to 
measure effects of the intervention at both therapist and patient levels. Simi- 
larly, a study on most effective police response to domestic violence incidents 
may randomize treatment assignment at any of several levels: city, police 
station, police officer or incident. There may be multiple calls to the same 
household during the study period, so that a household may interact with 
several police officers. Responses of interest might include officers' knowl- 
edge of and actions about domestic violence incidents, subsequent domestic 
violence reports from a household in the study or incident characteristics. 
The responses thus occur at three levels of experimental units. 

2. Models for responses. The goal of the study is to evaluate effects of 
the intervention on teachers and simultaneously on students in their classes. 
With that goal in mind, we introduce models that could be used at each 
level of response. In the following, is the k x k identity matrix, 1^ is the 
/c-vector of all ones, and = Ifcl^. We assume that there are a schools 
available for the study, and that school i has mi teachers and iii students 
who could participate. 

2.1. Teacher model. Let Tjj be a response of interest for teacher j at 
school i. Tij might be, for example, the change score on an assessment of 
content knowledge given before and after the intervention or, alternatively, 
Tijt could be the score on the assessment at time t in a longitudinal study. 
If we use a change score as a response, a possible model for the teachers' 
response is the mixed effects model 



where Tj = {Tii,Ti2, . . . , TjmJ' is an mj-vector of responses, Xj = [xji Xj2 ■ 
XimJ' is anrriiXp matrix of known covariates for the teachers, (3 = , /32, . . • , 
PpY is a p-vector of fixed effects, Vi iV(0,cr^) is a random effect for the 
school, and Si ~ N{0,a'^l) is an rrij-vector of random error terms for the 
teachers. Additionally, Vi and Eij are independent for all i and j. We assume 
in this model, then, that teachers from different schools are independent. 
The last column of Xj is the treatment assignment for teacher j from school 
i: 




I) 




-1 



if in experimental group, 
if in control group. 
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The last element of /3, Pp, is the parameter of interest for assessing the effect 
of the treatment on the teachers. 

Using mixed model theory [Demidenko (2004)], we have, for the setup in 

(2.1) , 

Cov(TilXi) = V, = al3„,^ + cifl^^. 

Because the data for all schools are independent of one another, the informa- 
tion for Pp is the sum of the information from each school. The 
information matrix of the generalized least squares estimator /3 = 
(EtiX^Vr^X,)-^EtiX^VriT, is 

a 

(2.2) Xt(/9,X) = ^X^V-%, 

1=1 

where X= [X;---X'J' and 

^'•^^ = ^^-^ - a|(cT| + a>.)"^-' • 

We are primarily interested in the {p,p) entry of the matrix Cov(/3) = [It{0, 

2.2. Student model. During the span of the experiment, student k at 
school i may have classes from one or more of the teachers in the project. 
Let lifc be a response measure for student k at school i for k = l,...,nj. 
One choice for Yn. might be a change score for an assessment given before 
and after the intervention. 

A model for l^fc needs to allow students to take multiple classes, and 
to account for dependence among students in the same school and among 
students who take classes from the same teacher. We propose the following 
mixed model, related to a model in McCaffrey et al. (2004), for the student's 
response. The response Yik depends on characteristics of the student and on 
characteristics of each teacher who instructs the student {i,k): 

(2.4) Y, = Ba + Di(X,6» + ti) + In^Si + t?^. 

Here, Yj = {Yii,Yi2, . . . , linj' is an nj-vector of student-level responses, Bj = 
[bjibj2 • • • bj„J' is an nj X g matrix of known covariates for students in school 
i, and 7 = (70,71, • . • ,7^-1)' is a g- vector of fixed effects. The nj x nii matrix 
Dj describes the assignment of students to teachers: the {k,j) element of Dj 
is dikj = number of classes student k from school i takes with teacher j. As in 
Section 2.1, Xj = [xji Xj2 • • • Xj^^ ] is an mj x p matrix of covariates for 
teachers at school i, whose last column is a vector of treatment indicators. 
The p- vector 6 = {9i,62, ■ ■ ■ , Op)' is a vector of fixed effects for the teachers. 
Since multiple students take classes from teacher (i, j), we include a random 
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effects vector, tj = {tii,ti2, ■ ■ ■ , iimj', such that each tij ~ A^(0, (t|). We posit 
an additive model for the effects of teachers on an individual student with 
element k of Dj(Xj0 + tj) representing the additive effect of all of the teach- 
ers taken by student {i,k). A student may take classes from any number of 
the rrii teachers in the school, and may have the same teacher for multiple 
classes. The model also includes a random effect for the school, Si ~ A^(0, a^), 
and random error terms for the students, t]^ = {r]ii,rii2, . . . ,rjinj' , such that 
r]ik ~ N{0,a'^). Assume that tij,Si and rjik are mutually independent for all 
values of the indices i, j and k. 

In practice, one would generally include only teachers participating in the 
experiment, or teaching classes related to the student outcomes, in the model 
in (2.4). Nonrelevant teachers — for example, physical education teachers in 
a study with mathematics outcomes — are likely to have little effect on the 
response through their physical education classes. If desired, alternative for- 
mulations of the Dj matrix can be used in model (2.4). For example, the 
entry dijk could be set to 1 if student k takes at least one class with teacher 
j and otherwise. With this formulation, if a student has the teacher for 
multiple classes, the benefit of the same teacher to that student only occurs 
once. 

The fixed effect vector, including the parameters at both student and 
teacher levels, is {-f',0'y . The parameter of interest is 6p, corresponding to 
the treatment effect. For the model in (2.4), 

Dj) = cjf J„. + DjD • + cJ^I„- . 



Cov(Y,- 



When (7,0) is estimable, the generalized least squares estimator is 

\ -1 n 



E 

\i=l 



(DjXj)' 



1-1 



Bj 



DjX, 



E 

i=l 



(DjXj)' 



The information matrix for a given design X and student assignment D 
is 



Js(7,^,X,D)=^ 



=1 



(DjXj)' 



B, 



DjX, 



If there are no student-level covariates B j , the information simplifies to 

a 



(2.5) 



T5(0,X,D) = ^Xp^I]-iDjXj. 



=1 



For the student model, we are primarily interested in the {p,p) entry of the 
matrix Cov(0|X, D), the variance for our additive treatment effect, which we 
will approximate with [Is{0,li.,'D)]^^ (when the inverse exists). For specific 
values of Dj, we can find the expected value of the information. 
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3. Efficiencies of randomization designs. In this section we examine the 
variance of the treatment coefficient for teachers and students under three 
possible designs. One design is held to be more efficient than another if the 
variance of the treatment effect is smaller, given that the numbers of schools, 
teachers and students are the same in each design. To facilitate theoretical 
comparison of the designs, we make several simplifying assumptions. For 
each design, assume that each of the a schools has the same number of 
teachers, mj = jn, where m is even, and the same number of students, rii = n. 
This scenario is a good approximation if we assume that schools in the study 
have been stratified by size, and models are fit separately to each stratum. 
Also assume that x^^ = [1 Xij2], where Xij2 is the treatment indicator, and 
that there are no student covariates available. In practice, any important 
available covariates should be used to improve the precision of the design, 
and randomization is employed to remove residual biases [Cox and Reid 
(2000), page 33]. In Section 5 we present a computer program that can be 
used if the m^'s and n^'s are unequal. 

In order to examine the variance of the treatment coefficient, we calcu- 
late the information matrix and the expected information for each design, 
for both the teacher-level and student-level responses. The inverse of the in- 
formation matrix is the covariance matrix for the estimated fixed effects in 
each model, when those effects are estimable. We work with the information 
matrix rather than the covariance matrix because some of the designs can 
lead to a singular information matrix. 

For any randomization that is done at the teacher or school level, the 
expected information depends on the Xj matrix. Let Xj = [-^^^ '■ R,J, 
where the jth element of Rj is the treatment assignment of teacher j from 
school i: 

1, if in experimental group. 



^"^'^^ if in control group. 

For the teacher model in (2.1), when randomization is employed, equations 
(2.2) and (2.3) show that the information is 



Xt(/3,X)=^ 



a r 1 



(3.2) 



^ 1 ^2 

1 



m I'Ri 
I'R,- m 



m? ml' Hi 
ml'R, l'R,R'l 



For the student model in (2.4), the information in (2.5) depends on how 
students are assigned to classes within each school. For a given assignment 
of students to teachers, the information is 



8 



B. JENNEY AND S. LOHR 



For each design, the expected information for 62 for a given assignment of 
students to teachers is 

a 

E[2:5(0;,X,D)|D]=^{tr[D^5]riDiCov(Ri)]+E[R^]D^S-iD,E[Ri]}. 

1=1 

(3.3) 



3.1. Design 1: randomize schools. In the first design we randomize at 
the school level, assigning half of the available schools to each treatment. If 
school i is randomized to the treatment group, then all teachers in the school 
i are in the treatment group and Rj = 1^- Likewise, if school i is randomized 
to the control group, then Rj = —Im- Thus, under this design, P(Rj = Im) 
= P(Ri = -l.m) = 1/2. Also, Eti I'R-i = 0, RiR- = Jm, E[Ri] = 0, and 
Cov(Ri) = Jm. 

For the teacher model, the information matrix from (2.2) and (3.2) for 
any realization of randomization with an equal number of schools in each 
treatment is 

X.,(^,X^ ^ 



Therefore, the variance of the treatment variable for the teacher model when 
randomization is by school is (u^ + (T^m)/(ma). 

For the student model, we have from equation (2.4) that the treatment 
effect is 62 if there are no student covariates and = (^i, ^2)'- When teachers 
are randomized by school, the information of the treatment indicator from 

(3.3) is 

a 

(3.4) X5i(0"2,X,D) = 5^tr[D:SriD,J^]. 

i=l 

When all teachers in a school are randomized to the same treatment, the 
information does not depend on X. Consequently, the expected information 
is attained for every realization of the design. 



3.2. Design 2: randomize teachers within schools. In the second design 
we randomly assign half of the teachers at each school to the experimental 
treatment, and the other half to the control treatment. With m teachers 
at each of the a schools, each school will have the same form for the X, 
matrix, with Xij2 = 1 for half of the teachers, and Xij2 = — 1 for the other 
half of the teachers. For any vector rj with entries 1 and —1 and with 
I'Ti = 0, we have P(Rj = r^) = (J^s)"^- Consequently, I'R^ = 0, E[Ri] = 
and Cov(Rj) = (mim — 3m) / {m — 1). 
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For the teacher model, the information matrix from (3.2) for any reaUza- 
tion of this randomization is 

1 



ma 



'al + aim) 



\ 
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For the student model, the information for a specific randomization de- 
pends on the assignment of students to classes. The information can be zero 
in some cases, as will be shown in Section 3.5. The expected treatment infor- 
mation over all randomizations for a given assignment of students to classes 
is, from (3.3), 



(3.5) E[X52(e2,X,D)|D] 



E 



1 



m 



1 



tr[D^I]-iDi(mI^- J^)]. 



3.3. Design 3: completely randomized design of teachers. In the third 
design we randomize half of the ma teachers, regardless of school, to the 
experimental group, with the other half in the control group. In Sections 
3.1 and 3.2 the variance of /3p was the same for the teacher model regard- 
less of which schools or teachers were randomly assigned to the treatment 
group. For the completely randomized design, school i may have between 
and m teachers in the treatment group. Under this design, E[R.i] = and 
Cov(Ri) = {malm - 3m) /{ma - 1). 

For the teacher model, the expected information matrix from (3.2) for 
this design is 



E[Xt3(/3,X)] 



ma 



a1 + a'?,m 



1 

1 + 





(m — l)ma 
ma — 1 cj| 



For the student model, the expected treatment information from (3.3) is 

(3.6) E[X53(^2,X,D)|D] = ^-f^tr[D:i]-iDi(maI^- J™)]. 

ma — 1 ^ 

3.4. Comparison of designs. Table 1 gives the expected information for 
the treatment coefficient for the designs discussed in Sections 3.1-3.3. Since 
each expected information matrix is diagonal, the anticipated variance of f32 
or 02 will be the reciprocal of the information (when it is nonzero). 

For teacher-level assessments, as shown in Moerbeek, van Breukelen and Berger 
(2000), the expected information is highest when randomization is done 
within the schools. The information is lowest for design 1, when teachers 
are randomized by school. Depending on the realization of randomization. 
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the efficiency of the completely randomized design will be between that of 
randomizing by teacher within schools and randomizing by school. 

For student-level assessments, the expected information of 62 given Dj is 
of the form 

a 

^tr[D^S-iD,(A:iI„ + A:2Jm)], 
1=1 

for ki, k2 specified in Table 1. In general, Dj depends on the assignment 
of students to classes in each school, and a design that is most efficient 
for teacher-level assessments may not be efficient for student-level assess- 
ments. Because depends on Dj, in general, the expressions for expected 
treatment information in the student model need to be computed numeri- 
cally. Simplifications are possible for some special cases of the student model 
when Dj has specified characteristics, and we examine one of these in the 
next section. 

Other criteria and models may be used when evaluating designs. Since we 
are interested in measuring program impact on both teachers and students, 
we may choose to think of the information as a weighted average of the 
information from the two models: aXr(/52>X) + (1 — a)Zs'(6'2, X, D), where 
a is chosen to reflect the relative importance of the two responses. The 
setup may also be extended to allow more hierarchical factors. For example, 
if schools are nested in districts, we could also consider randomizing the 
treatment by district. With even fewer clusters, this design would be less 
efficient for estimating teacher and student effects than design 1. 

3.5. Information for student model when Dj is balanced. In this section 
we examine the special case when the assignment of students to classes is 
done in a balanced way. We define a balanced assignment to be one in which 
each student takes c classes from different teachers, and each of the possible 
(™) assignments of c teachers to a student occurs with the same number of 



Table 1 

Comparison of expected inforraation for treatment variable 



Randomization 


Expected treatment information 


Teacher model 


Student model 


By school 


ma 
cr'^+ir'^m 


ELitr[D;S-^D.J^] 


Within school 


ma 


;;r^Er=itr[D:sriD,(mI^ - J„)] 


CRD 


ma r-i 1 (m— l)mafT^, T 
{a'i+cr'im) ' ' mo-l i 


;;7irT E:.i tr[D:E-iD,(maI„ - J„01 
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students. Consequently, in a balanced design, each class has nc/m students. 
We show in Supplement A [Jenney and Lohr (2008a)] that, for a balanced 
assignment Dj, 

(3.7) tr(D:S-iD,J„^ 



and 
(3.8) 



mtr(D^SriDi) =tr(D:i],riD,J^) 

m(m — l)nc(m — c) 



+ 



m{m — l)cr^ + nc{m — c)af 



2 • 



Prom (3.4) and (3.7), then, for any realization of the randomize-by-school 
design. 



X5l(^2,X,D) 



acp'n 



na'j + c^a^n/m + cr^ 

From (3.5), (3.7) and (3.8) the expected information for the randomize 
within schools design, given D, is 

T7fT (ft vT^MT^i- amnc{m-c) 

nc[m — cjaf + m(?7i — Ijcj^ 

From (3.6), (3.7) and (3.8) the expected information for the completely ran- 
domized design, given D, is 

E X53 ^2,X,D) D = 1 ^ 

ma — 1 \mnag + nc^cji + ma^ 

^ 1 f a?"m{m — l)nc{m — c) 



ma — 1 \nc{m — c)(t^ + m(m — l)cr^ 



3.5.1. Information when each student takes one class from each teacher. 
When c = m, that is, each student takes a class from each of the m teachers, 
then 

^Sl(^2,X,D) = 

E[T52(^2,X,D)|D]=0, 

a{a — l)m^n 



E[T53(^2,X,D)|D] 



{ma — l){na'^ + nma^ + a^) 

Thus, in the design in which teachers are randomized within schools to 
the treatment, we cannot even estimate the desired treatment effect at the 
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student level. In design 3 the expected information is positive, but it is 
possible to have a randomization in which 62 is not estimable. The expected 
information from design 3 is less than that from design 1. When c = m, the 
most efficient design for estimating 02 is design 1, in which all the teachers in 
a school are randomized to the same treatment. In this situation, then, design 
1 is the most efficient design for estimating the effect of the intervention on 
the students. When examining the effect on teachers, we found that design 
1 was the least efficient design. 

3.5.2. Information when each student takes one class. For the special 
case in which each student takes a class with one teacher and each class 
has the same number, n/m, of students, then the information under any 
realization of design 1 and the expected information under designs 2 and 3 
are 

X5l(^2,X,D) 

E[Xs2(^2,X,D)|D] 
E[Xs3(^2,X,D)|D] 



cj^ + a^n/m + na"^ 
an 



a^n/m + cj^ ' 

1 / a{a — l)n 



ma — 1 \ na'^ + a^n/m + o"^ 
1 / a^nim — 1 



ma — 1 \afn/m + 

In this case, design 2, when teachers are randomized within schools to the 
treatment, is the most efficient for estimating the treatment effect at the 
student level. Design 2 is also most efficient when examining the treatment 
effect at the teacher level. 



3.5.3. Information for student model in other situations. When c is be- 
tween 2 and m — 1, the relative efficiency of design 1 to design 2 depends on 
the values of the variance components and . A necessary and sufficient 
condition for 

E[IS2{02)]>IS1{O2) 

is that 

n{m — c)ag > m{c — 

In practice, we expect 7i to be large relative to ?n and > 0, so in most 
educational situations we would expect to have higher information for the 
estimated treatment effect on students from design 2 than from design 1. 
Note, though, that if c is even, it is possible for a particular realization of 
design 2 to have information 0. 
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4. Contamination, noncompliance and attrition. Often, because of the 
nature of an intervention, a cluster-randomized trial is preferable to a com- 
pletely randomized design or randomization within clusters. Contamination 
may be a concern in any experiment where subjects are clustered. For exam- 
ple, in medical informatics studies on decision support systems, a clinician 
treating both control and intervention patients may gain knowledge from the 
system when treating intervention patients and apply that knowledge to the 
treatment of the control patients [Chuang, Hripcsak and Heitjan (2002)]. 
Social science interventions are often community-based, and also susceptible 
to control group contamination when the control and treatment groups are 
mixed within clusters. In an educational setting, it may be difficult to limit 
the contamination of control groups when the treatment is applied at the 
teacher level and teachers are randomized to the treatment within schools. 
A discussion among teachers about teaching methods or lesson plans could 
influence a control group teacher to adopt elements of the treatment when 
teaching. Control group contamination can decrease the measurable effect 
of the treatment. If control group contamination is expected to be moderate 
to large, a cluster randomized trial is recommended [Moerbeek (2005)]. 

Likewise, noncompliance to the treatment within the experimental group 
and attrition can also reduce power to detect treatment effects. An exper- 
imental group teacher can be noncompliant by following the control group 
treatment, resulting in more similarity between control and experimental 
group. In a multi-semester evaluation, teachers or students may change 
schools or drop out of the study for other reasons, and this attrition may af- 
fect the power of the original design. In a study of noncompliance, Jo (2002) 
notes that higher power to detect treatment effects can also be obtained for 
within-cluster randomization (design 2) when fewer subjects are assigned to 
the intervention (unbalanced design). 

Moerbeek (2005) studies effects of contamination on power in an "intent- 
to-treat" analysis for the teacher-level model. For that model, she argues 
that, for design 2, when 100g% (0 < g < 1) of the control teachers follow 
the experimental treatment instead, the effect on power is equivalent to 
multiplying the variance of the estimated treatment coefficient by (1 — g)~^. 

In this section we explore effects of contamination when it is known which 
teachers in the control group have been contaminated and explore effects of 
multicollinearity on the expected information from the designs studied in 
Section 3. Similar methodology could be applied to the treatment group 
to model noncompliance. Suppose that the teacher model is as in Section 
2.1, with the modification that {p + l)-vector of covariates for the 

teacher. The additional covariate, the contamination indicator of an indi- 
vidual teacher, can be described as a Bernoulli random variable, where 
probability of contamination depends on the ratio of treatment teachers 
to control group teachers at a particular school. We will outline the model 



14 



B. JENNEY AND S. LOHR 



where control group teachers are subject to contamination. Then the new 
indicator, Cij, equals 1 if teacher j from school i is in the control group 
and is contaminated, and equals zero otherwise. Let Cij = (1 — Rij)Zij/2, 
where Zij ~ Bin(l, (I'Rj + m)q/m), and Rij is defined in (3.1). For design 
2, < < 1; for design 3, < g < 1/2. Then when E[Rj] = 0, 

E[Q] = - Cov(Ri)l^), 

2m 

which depends on the randomization design. 

For the teacher model, we still examine the pth element of f3, which 
is the parameter of interest for assessing the effect of the treatment, but the 
estimate of the treatment effect may be influenced by the new contamination 
covariate. In the simple model where there are no additional teacher covari- 
ates, Xj = '■ p},^ \ cj, where Cj is the column of contamination 

indicators. 

For design 1, randomize-schools, Cj = for all i so contamination has 
no effect. For design 2, E[Rj] = 0, Cov(Rj) = {mim — Jm)/{iT^ — l)i and 
E[Cj] = {q/2)lm- For a general matrix Gj, 



E[X',GiX,|Gi] 
= E = 



tr(G,Jm) ftr(G,J„) 

tr[G,Cov(R,)l -f tr[G.Cov(RO] 

ftr(G,J„) -f tr[G,Cov(RO] tr[G,{4 [J,„ + Cov(RO] + ^^I™}] 



When the inverse exists, the (2, 2) entry of E ^ is 

1 f^^ q tr[G,Cov(R, 



tr[G,Cov(R,)] I 2(1 -g) tr(Gi) 

Taking G, = V~^, for the teacher model the anticipated variance per school 
in design 2 is inflated by 



2(1 -g)V + 

For the student model, taking Gj = DjS^^Dj for a balanced Dj as de- 
scribed in Section 3.5, and using equations (3.7) and (3.8), we have that the 
anticipated variance per school is inflated by 



1 + 



2(1 — q) {m — l)na'^ + nc^af + {m — l)/{m — c)a'^ 



For design 2, in each of the teacher and student models, the anticipated vari- 
ance increases due to the contamination. The anticipated variance inflation 
for design 3 can be calculated similarly. 
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Fig. 1. Teacher model: compare smoothed density of treatment variance for the three 
randomization designs. Left panel: a1 = 1.6, al = 14.4. Right panel: = 4.8, = 11.2. 
The randomize- schools design is less efficient when the hetween-school variance is a higher 
proportion of total variance (right panel). 



5. Examining the distribution of the anticipated sample variance. In 

Sections 3 and 4 we derived expected information matrices for the teacher 
and student models. For the student model, the information matrix depends 
in a complex way on the assignment of students to classes, and is analytically 
tractable only for special cases. In addition, it is possible for a specific real- 
ization of a design to have information that is far from its expected value. In 
this section we examine the distribution of the anticipated variance of the 
treatment effect computationally under various scenarios. 

The macro multleveldesign, which is written for use with SAS software 
[SAS Institute Inc. (2008)] and is available in the supplementary material 
file posted at the journal website [Jenney and Lohr (2008b)], estimates the 
expected value and distribution of the anticipated variance of the treatment 
effect for inputted values of a, ni, . . . , n^, mi, . . . , rua and the variance com- 
ponents. Unlike other programs such as Optimal Design [Liu et al. (2006); 
Raudenbush and Liu (2000)] , multleveldesign handles multiple response lev- 
els and data that are not completely nested. The macro displays the distri- 
bution of the sample variance of the treatment effect for simulated data 
and gives the empirical power estimate for each randomization design of the 
student and teacher model. 

We illustrate the macro with settings based on pilot data from Project 
Pathways. We plotted the simulated distribution of anticipated variance for 
the treatment variable at teacher and student levels using anticipated num- 
bers of teachers and students in schools that would be available for the study. 
Each simulation in Figure 1 and Figure 2 generates data for 16 schools, with 
8 teachers and 200 students at each school, according to the assumptions for 
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the models in equations (2.1) and (2.4), with variance components stated in 
the figure captions. We also assume that each student takes two classes from 
teachers at their school, in order to generate the D matrix. For these exam- 
ples, we randomly selected c = 2 teachers with replacement for each student. 
A study of efficiency of the models was conducted which set the intra-class 
correlation for the teacher model, p = a1/{a1 + o"^), to be 0.1 or 0.3, and var- 
ied the other inputs of number of schools, teachers per school and students 
per school. Figure 3 and Figure 4 display results with model contamination 
in both the teacher and student model, with q = 0.5 in the contamination 
coefficient. Note that in Figure 4, the contamination can make designs 2 and 
3 less efficient overall than design 1. 

Given the possibility of control group contamination in Project Pathways, 
and based on the explorations in Figures 1-4 and other simulations under 
the assumptions given above, we believe a randomize-by-school design would 
be best. Under this scenario, with the estimated variance components given 
in the left panels of Figures 3 and 4, design 1 yields empirical standard 
deviation estimates of 0.9 for the teacher model, and 0.8 for the student 
model. 

Although we did not include teacher or student covariates in the macro, 
they would likely be available to most researchers conducting an educational 
study and could be incorporated by taking a1 and a1 to be variance compo- 
nents of residuals after adjusting for known covariates. This has the effect of 
reducing the cluster effects, since some of the school-to-school and teacher- 
to-teacher variability can be explained by covariates such as socio-economic 
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T 1 r 1 r-^ T i i 1 T"^ 

0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 

Treatment variance over 1000 simulations Treatment variance over 1000 simulalion? 

Fig. 2. Student model: compare smoothed density of treatment variance for the three 
randomization designs. Left panel: a'j. — 1.6, erf — 14.4, cr^ = 14.4. Right panel: — 4.8, 
at = 11.2, = 11.2. The randomize- schools design is nearly as efficient as the other two 
designs for the student model when the between- school variance is expected to be low (left 
panel). 
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Fig. 3. Contaminated teacher model: compare smoothed density of treatment variance 
for the three randomization designs, with q = 0.5. Left panel: al — 1.6, = 14.4. Right 
panel: — 4.8, erf — 11.2. Contamination makes the randomize-within-schools and com- 
pletely-randomized designs less efficient, hut does not affect the randomize-schools design 
(as compared to Figure 1). 



status, years of teacher experience and other variables. The inclusion of 
covariates in the model should increase the power for detecting treatment 
differences [Bloom, Richburg-Hayes and Black (2005)]. 




Treatmeni variance over lOOC simuiaiions Treatment variance over looo simuiaiion$ 



Fig. 4. Contaminated student model: compare smoothed density of treatment variance for 
the three randomization designs, with q — 0.5. Left panel: a1 — 1.6, a]: = 14.4, cr^ — 14.4. 
Right panel: Og = 4.8, o-t = 11.2, cr^ = 11.2. Contamination makes the randomize-with- 
in-schools and completely-randomized designs less efficient, but does not affect the ran- 
domize-schools design (as compared to Figure 2). 
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6. Conclusion. The teacher-student conduit is crucial to the success of 
educational interventions such as Project Pathways, making it particularly 
important to evaluate the effect of the intervention on both teachers and stu- 
dents. There are clearly issues other than efficiency involved in the choice of 
evaluation design for Project Pathways. Among these are ease of implemen- 
tation, school and teacher compliance, attrition and contamination, and the 
perceived fairness of the mechanism of randomization itself. Because many 
interventions such as Project Pathways encourage community-based activi- 
ties, there can be a risk of contamination if the randomization is performed 
within schools or if a completely randomized design is used. These concerns, 
together with the need to measure impact on students, suggest that it may 
be beneficial for such studies to consider randomization at the school level 
even though that design may be less efficient for measuring the effect of the 
intervention on teachers. 

In the simulations in Section 5 we assumed that students are assigned 
randomly to teachers within a school. In many design implementations, in- 
cluding Project Pathways, we would expect this assumption to be reason- 
able, since there is little reason to believe that choice of teacher would be 
influenced by the teacher's assignment to group. In other studies, however, 
self-selection of teachers by students may be more of a concern. If good 
students disproportionately choose teachers in the experimental group, the 
estimates of treatment effect will be biased. In such a situation we recom- 
mend a cluster-randomized trial, such as design 1. 

We note that while this study used random effects models to describe 
dependence among teachers and students, if information about teacher in- 
teractions is available, dependence and contamination could alternatively be 
modeled using an adjacency matrix as in social network analysis 
[Wasserman and Faust (1994)]. Hoff (2003) presents random effects mod- 
els to express the dependence found in social network data. 

If the interest is mostly in the effect of the intervention on teachers, 
the most efficient designs would be to randomize assignment of treatments 
within schools. But these designs are not optimal for estimating the effect 
of the intervention on students — indeed, in certain cases when teachers are 
randomized within schools, the effect of the intervention on students is not 
even estimable. To estimate effects on students, it may be better to use a de- 
sign in which randomization is performed at the school level when a student 
takes classes from multiple teachers at the school. 

In this paper we discussed design issues in the context of an educational 
study. The results, however, are general and can be applied to any setting 
in which data are collected at multiple levels. One application, for example, 
would be clinical trials in which patients are treated by several health care 
practitioners. 
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SUPPLEMENTARY MATERIAL 

Supplement A: Proof of information for student model when Dj is bal- 
anced (DOL 10.1214/08-AOAS216SUPPA; .pdf). Proofs of equations (3.7) 
and (3.8) appear in a supplementary file posted at the journal website. 

Supplement B: SAS program for simulation of anticipated variance (DOL 
10.1214/08-AOAS216SUPPB; .zip). A SAS macro lets the user input simu- 
lation scenarios, including variance components, number of schools, number 
of teachers and students at each school, and contamination coefficient, then 
generates comparison graphs of the density of the anticipated variance for 
the three randomization designs under the teacher and student models, along 
with empirical power estimates. 
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